Experiment: pratte17
We will first look at the effect of leaving out individual set sizes on the in-sample fits and out-of-sample prediction of the data, and associated predictions of all models. Here, all models are fit to all (remaining) set sizes simultaneously, i.e., mean memory precision changes for individual set sizes is derived from mean memory precision for a single item. Please see the manuscript for a more detailed explanation of the approach shown here.
By experiment-level data, we mean the data in pratte17 as a collection of the data provided by all 12 participant(s). The manner in which data, model fitting results and predictions of an experiment are sumamrized can vary depending on goal and approaches. For some aspects of the data, some approaches make more sense than others. For completeness’ sake we show a variety here, even if some are somewhat non-sensical.
Here we treated all data in pratte17 as if it had been provided by a single participant (rather than provided by 12 participant(s)). We fitted all models to this aggregate data. The model comparison shows relative model fits on the basis of the fit to the aggregate data.
In the error distribution and summary statistics, the data is given by the observed data aggregated across individuals. The predictions (error distributions, resultant summary statistics and normalized RMSD relative to the aggregate data) are made on the basis of aggregate fit best-fit parameter estimates for all models.
In this tab, all graphs are shown for the purpose of illustrating the performance in the experiment on average, i.e., averaged across participants.
The model comparison plot is based on determining and averaginfg the relative difference of all individual model fits to the best-fit model. Across the data corpus some participants produced extreme out-of-sample predictions (> 10000 points) for some held-out set sizes. Where this is the case we present two model comparison graphs. (A) shows the model comparison with all participants included, and where applicable (B) shows the model comparison for those participants excluded for all panels.
Here in pratte17, we excluded 0 participant(s) (``) for extreme out-of-sample predictions (> 10000 points); the number of participants in each panel is shown at the top of each panel.
The predictions of the behavioral signature pattern (error distribution and resultant summary statistics) are based on averaging the parameter estimates of the best-fit individual fits. The observed data is given by the aggregate data for both graphs. If any participants were excluded for extreme out-of-sample prediction in the model comparison, they were also excluded here. Thus the graphs below represent the data and predictions of 12 participants.
In the tab “Averaging individual parameter estimates”, we looked at the prediction of error distributions and resultant summary statistics on the basis of averaging participants’ best-fit parameter estimates. Here, the predictions are based on averaging individual’s predictions (which were based on best-fit parameter estimates). We excluded 0 participant(s) (``) as in “Averaging individual parameter estimates” before aggregating/averaging individuals’ predictions. For the summary statistics (across set sizes, and normalized RMSD) we show two graphs: A) deriving the summary statistics from the averaged error distributions (for NRMSDs compared to the summary statistics of the aggregate data), and B) averaging individuals’ summary statistics, and similarly aggregating individuals’ normalized RMSDs.
Plots labeled “A”. The following summary statistics are derived from the averaged error distribution above. The graphs with the normalized NRMSD shows the comparison of these averaged prediction to the aggregate data. This is not particularly useful as the distribution underlying the aggregate data is not necessarily the averaged of the individuals’ data, but it is one approach to summarizing the data.
Plots labeled “B”. The following summary statistics are based on averaging participants’ summary statistics, for both the data (e.g., the summary statistics derived from individual observed error distributions) and the model predictions (e.g., the summary statistics derived from individual observed error distributions). The graphs with the normalized NRMSD shows the normalized RMSD across individuals as boxplots to provide an idea of the spread of the NRMSD in the experiment for these models (i.e., NOT the normalized RMSD derived from contrasting averaged predicted and observed summary statistics.)
It is possible that the effect of set size 1 is due to the data at set size 1 being particularly diagnostic of a difference between models. We therefore fitted the VP models with unlimited memory capacity (+/- models) to the data of all set sizes separately. This means that all parameters were estimated separately for all set sizes, and mean memory precision was not assumed to be linked across set sizes by the power function.
We compared models’ fit in deviance terms, with the relative difference between models when data from all set sizes was fit simultaneously.
A. Based on the summary statistic derived from averaged error distributions.
B. Based on summary statistics averaged across participants (derived from individual best-fit predictions of the error distribution)